4 f 

4 > 

MICROCOPY  RESOLUTION  TEST  CHAjtt 

NATIONAL  BUREAU  OF  STANDARDS  1963-T 

..  ..  . I ' • 

■ ■ •»' h-J 


Publication  538-01-1*696 


RELATIONSHIP  BETWEEN  ACCEPTANCE-TEST 
RELIABILITY  AND  OPERATIONAL  RELIABILITY 


15  November  1966 


Prepared  for 

U.  S.  NAVAL  AMMUNITION  DEPOT 
CRANE,  INDIANA 
under  Contract  N164-11329 


RESEARCH  CORPORATION 


RELATIONSHIP  BETWEEN  ACCEPTANCE-TEST  RELIABILITY 
AND  OPERATIONAL  RELIABILITY 


15  November  1966 


U.  S.  Naval  Ammunition  Depot 
Crane,  Indiana 
Under  Contract  N164-11329 


ARINC  RESEARCH  CORPORATION 
a subsidiary  of  Aeronautical  Radio 
2551  Riva  Road 
Annapolis,  Maryland  21401 

Publication  538-OI-I-696 


Apprortd  foe  public  release; 
Distribution  Unlimited 


© 1966  ARINC  Research  Corporation 

Prepared  under  Contract  N164-11329  which 
grants  to  the  U.  S.  Government  a license 
to  use  any  material  in  this  publication 
for  Government  purposes. 


i 


; if 


i 

j 

L 

Li 

f] 

I 

i 

I 

I 


SUMMARY 

This  report  describes  an  investigation  of  the  relationships  between  equipment 
reliability  estimates  as  measured  during  manufacturer's  reliability  demonstration 
testing  and  as  measured  under  operational  conditions.  The  work  was  performed  by 
ARINC  Research  Corporation  for  the  Naval  Ammunition  Depot,  Crane,  Indiana 

One  of  the  earliest  comprehensive  studies  of  reliability  was  performed  by 
the  Advisory  Group  on  Reliability  of  Electronic  Equipment  (AGREE).  Task  Group  3 
was  charged  with  the  development  of  "basic  requirements  for  tests  . . . which 
will  prove  conclusively  that  the  equipment  will  meet  the  minimum  acceptable  figure 
for  reliability  established  for  the  equipment  type". 

The  study  documented  in  this  report  indicated  that  there  is  a definite  rela- 
tionship between  airborne  equipment  reliability  estimates  as  measured  during 
AGREE  testing  and  as  measured  under  operational  conditions.  This  is  the  first 
time  that  such  a relationship  has  been  established  by  data  from  as  many  different 
equipments  (l8)  as  were  used  in  this  study. 

The  results  indicate  — with  a data  correlation  of  0.8l  — that  the  relation- 
ship between  operational  MTBF  and  AGREE-test  MTBF  may  not  be  1-to-l,  and  is 
expressed  as: 

6q  = 0.0176a  1,76 

where 

0 o = MTBF  measured  under  operational  conditions 
©A  = MTBF  measured  under  AGREE-test  conditions 

This  relationship  represents  equipments  whose  MTBF  values  ranged  from  12  to 
301  hours  during  operational  use  and  from  47  to  454  hours  during  AGREE  testing. 

Figure  S-l,  which  is  based  on  the  above  relationship,  shows  for  example 
an  MTBF  estimate  of  58  hours  in  operational  use  for  an  average  AGREE  result 
of  100  hours.  However,  since  the  available  data  was  not  sufficient  to  develop 
relationships  by  classes  of  equipments,  this  is  an  average  figure  for  all  types 
of  equipments  and  may  not  hold  for  specific  classes  of  equipments. 





i 


— p~ 


ill 


MTBF  Demonstrated  by  Manufacturer  Under  AGREE-Test  Conditions  (Hours) 

FIGURE  S-l 

RELATIONSHIP  OF  MTBF  VALUES  UNDER 
AGREE-TEST  AND  OPERATIONAL  CONDITIONS 

Multiple  regression  analysis  was  the  basic  statistical  technique  used  in 
this  study.  Several  factors  other  than  the  AGREE  test  results  were  recognized  as 
pertinent  to  the  accuracy  of  estimating  expected  field  operational  results.  How- 
ever, the  data  for  all  these  factors  were  not  available  for  each  equipment  used 
in  the  study.  Consequently  only  the  following  additional  factors  were  considered: 

• Complexity  of  equipment  (AEG  - Active  Element  Group  Count) 

• Average  mission  length  of  aircraft 

• Data  collection  methods 

• Number  of  failures  observed  during  operational  use  of  the  equipment 


iv 


I 


CONTENTS 


SUMMARY 

1 . INTRODUCTION 

2.  DATA  SOURCES  AND  ACQUISITION 

3.  ANALYSES  AND  DISCUSSION  OF  RESULTS 

3.1  Mathematical  Model 

3.2  Discussion  of  Results 

3.2.1  Equipment  Description 

3.2.2  Environmental  Descriptors 

3.2.3  Mission  and  Application  Descriptors 

3.2.4  Influence  of  Data  Source 

3.2.5  Reliability  Measures 

3.2.6  Final  Relationship 


Page 


iii 

1 

3 

5 

5 

6 

6 

6 

6 

6 

7 

7 


1 . INTRODUCTION 

In  recent  years,  dialogue  on  reliability  testing  has  generated  two  distinct 
standpoints  whose  divergence  concerns  not  so  much  the  basic  principle  of  relia- 
bility testing  as  what  precisely  is  being  measured.  From  one  standpoint,  a fig- 
ure of  merit  in  terms  of  freedom  from  catastrophic  malfunction  will  suffice. 

From  the  other  standpoint,  it  is  considered  essential  to  have  a realistic  assess- 
ment of  actual  field  experience;  effects  such  as  the  Interaction  among  parts  that 
comprise  a system,  and  environmental  factors  such  as  imperfect  support,  mainte- 
nance, and  operation  must  be  accounted  for. 

In  essence,  the  two  approaches  provide  respectively  a manufacturer's  view 
and  a user's  view  of  reliability.  Both  types  of  assessment  have  their  use  in 
the  overall  drive  toward  attaining  better  system  value.  One  of  the  earliest  com- 
prehensive studies  to  recognize  these  differences  was  performed  by  the  Advisory 
Group  for  Reliability  of  Electronic  Equipment  (AGREE).  In  the  widely  read 
report1  published  by  this  group,  nine  task  groups  discussed  and  presented  recom- 
mendations on  the  specific  aspects  of  a reliability  program. 

The  results  reported  by  Task  Group  3 are  particularly  relevant  to  this 
report.  This  Group  developed  the  basic  requirements  for  the  AGREE  tests;  these 
were  intended  to  serve  as  the  guidelines  which  could  be  used  to  assure  achieve- 
ment of  the  reliabilities  considered  acceptable  for  given  equipments.  The  speci- 
fication of  the  various  environmental  levels  for  reliability  testing  in  the  AGREE 
publication  is  an  indication  of  the  effort  to  minimize  the  differences  between 
laboratory  test  results  and  field  results. 

As  stated  almost  ten  years  ago  ....  "the  reliability  Index  obtained  from 
the  equipment  under  test  will  be  a useful  measure  of  the  field  reliability  even 
though  the  test  conditions  only  approximately  simulate  the  combined  environmental 
effects  which  may  be  obtained  In  field  use.  Thus,  while  the  measured  MTHF  may 
differ  somewhat  from  that  prevailing  during  operational  use,  this  discrepancy 
will  be  small  in  contrast  to  that  due  to  errors  of  measuring  technique,  differ- 
ences in  application,  field  maintenance,  etc.,  . . . ."  Equally  important 
is  the  statement  on  the  same  page  of  the  report  to  the  effect  that  the  tests  are 
designed  to  measure  the  Inherent  reliability  of  the  equipment  under  test.  From 
the  above  statements  it  can  be  concluded  that  AGREE  test  results  should  approxi- 
mate field  results  only  if  there  are  no  errors  in  measuring  technique  (in  the 

^Advisory  Group  for  Reliability  of  Electronic  Equipments,  Office  of  Assistant 
Secretary  of  Defense  (R&D),  June  1957. 


field)  or  differences  in  application,  field  maintenance  (due  to  personnel, 
training,  procedures,  logistics  or  test  equipment,  and  similar  factors.)  However, 
it  is  essential  that  the  user  be  assured  that,  when  an  equipment  is  used  in  the 
field,  it  will  be  at  least  as  reliable  as  has  been  indicated  during  demonstration 
testing.  To  this  end,  tnis  report  documents  a study  to  correlate  AGREE  laboratory 
results  with  field  usage  results  and  to  establish  the  mathematical  relationship 
between  the  two  sets  of  results. 

Section  2 discusses  the  sources  and  acquisition  of  data  used  in  this  study. 
Section  3 discusses  the  analysis  of  the  data  and  the  results  of  the  analysis. 
Detailed  discussion  of  various  points  and  derivations  of  formulas  are  presented 
in  the  appendixes. 


2.  DATA  SOURCES  AND  ACQUISITION 

The  scope  of  this  study  allowed  only  for  the  use  of  available  data;  no 
provisions  were  made  for  generating  or  collecting  field  data.  The  data  sources 
were  to  be  Navy  material  and  ARINC  Research  Corporation's  ln-house  references 
material. 

The  data  used  were  restricted  to  those  from  airborne  equipments.  Although 
AGREE  type  data  were  located  on  39  equipments,  field  operational  data  In  suffi- 
cient quantity  to  yield  meaningful  MTBF  values  were  available  for  only  20  of 
them  — anu  of  these,  the  actual  number  of  failures  observed  were  available  for 
only  eleven.  A summary  of  the  data  used  Is  presented  In  Appendix  A. 


3.  ANALYSES  AND  DISCUSSION  OF  RESULTS 


3.1  Mathematical  Model 

The  relationships  between  field  operational  reliability  and  reliability 
under  AGREE-test  conditions,  system  parameters,  'and  use  factors  were  Investi- 
gated by  the  statistical  technique  of  multiple  regression  analysis.  (See 
Appendix  B for  a discussion  of  this  technique.)  Regression  was  required  because 
of  the  need  to  consider  the  effects  of  many  variables  simultaneously.  The  lin- 
ear model  was  considered  to  be  of  sufficient  accuracy  for  the  study.  The  basic 
model  of  field  usage  reliability  used  for  the  regression  analysis  is: 


where 


X are  system  and  use  characteristics 


Bn  are  the  true  regression  coefficients 
relating  these  characteristics  to  reli 
ability 


Yp  and  Y.  are  measures  of  field  usage  and  AGREE  reliability 


is  the  true  regression  coefficient  relating  the  measure 
of  AGREE  reliability  to  field  usage  reliability. 


In  this  study,  in9  was  used  as  the  measure  of  reliability 
the  basic  regression  equation  became: 


b.  is  a statistical  estimate  of  B.  and 


9 is  the  mean  time  between  failures  of  an  equipment 


In  the  analysis,  in 0p,  ineA,XgX3  . . .Xn  are  known  for  each  equipment,  and  the 
regression  analysis  is  essentially  the  solution  of  the  set  of  such  equations  for 
the  best  values  of  the  coefficients.  For  a prediction  of  a new  equipment,  the 


in  = Natural  log 


values  of  £n0^  Xg,  . . . XR  are  3ub3tlbute<1  311(1  the  values  for  b0,  bj,  . . . bn 
determined  above  are  used  to  compute  the  value  of  fnSp  where  0p  Is  the  expected 
field  mean  time  between  failures. 

3.2  Discussion  of  Results 

Several  factors  other  than  the  AGREE  test  results  were  recognized  as 
pertinent  to  the  accuracy  of  estimating  expected  field  operational  results. 

These  Included  descriptions  of  equipment,  environment,  mission,  and  application; 
data  collection  methods;  and  reliability  measures.  However,  the  data  for  all 
these  factors  were  not  available  for  each  equipment  used  In  the  study.  Conse- 
quently, only  the  following  additional  factors  were  considered: 

• Complexity  of  equipment  (AEG) 

• Average  mission  length  of  aircraft 

• Data  collection  methods 

• Number  of  failures  observed  during  operational  use  of  the  equipment 

3.2.1  Equipment  Description 

The  only  equipment  description  used  In  the  regression  runs  was  complexity 
as  measured  by  Active  Element  Group  (AEG)  count.  In  most  instances  this  factor 
was  not  significant.  That  Is,  no  reduction  In  the  variability  In  the  estimates 
of  operational  reliability  was  obtained  by  Including  this  factor  with  the  AGREE 
test  results. 

3.2.2.  Environmental  Descriptors 

Since  all  the  AGREE  tests  were  run  at  the  same  test  level,  (all  equipments 
In  the  study  were  airborne  equipments)  the  environmental  variation  was  In  too 
narrow  a range  to  be  used  as  a factor.  Environmental  data  was  not  available 
for  the  operational  data. 

3.2.3  Mission  and  Application  Descriptors 

When  the  final  relationship  was  derived,  estimates  for  equipments  that 
were  used  on  more  than  one  aircraft  were  used  individually  — the  average  flight 
length  of  the  aircraft  was  used  to  distinguish  one  estimate  from  another. 

3.2.4  Influence  of  Data  Source 

The  sample  size  was  not  sufficient  to  establish  positively  the  degree  to 
which  the  regression  results  were  dependent  on  data  source.  However,  18  equip- 
ments' data,  differently  grouped  according  to  source,  were  used  to  make  four 
regression  runs.  The  results  are  shown  in  Table  1;  the  specific  data  used  for 
each  run  are  identified  in  Table  3,  Appendix  A.  Briefly,  the  first  two  runs 
used  data  from  all  18  equipments  and  differed  only  in  respect  to  the  operational 


f I 

: 1 


TABLE  1 

REGRESSION  STATISTICS  GENERATED  WITH  DIFFERENT  DATA  SUBSETS 


Run 

Source  of  Data  Sets 

Number  of 
Equipments 
Represented 

Correlation 

Standard 
Ei  ror  of 
Estimate 

— 

Relationship 

Number 

Operational 

AGR’  1 

Coefficient 

1 

ARINC  and 
NATSF 

Mfr  and 
NATSF 

18 

0.73 

0.75 

eo  • °-8M4°'93 

2 

ARINC  and 
NATSF 

Mfr  and 
NATSF 

18 

0.73 

0.80 

60  - 0.6V-98 

3 

ARINC 

Mfr 

8 

O.69 

0.80 

90  = 0.920/- 24 

4 

NATSF 

NATSF 

6 

0.86 

0.77 

eo  = 0- 096/- 32 

MTBF  values  used  for  four  equipments  for  which  such  data  were  available  from  two 
sources.  The  differences  in  the  data  sets  did  not  appear  to  affect  the  regression 
statistics  significantly.  Runs  3 and  4 represented  two  groups  of  equipments  dis- 
tinguished by  their  different  sources  of  data  for  both  operational  MTBF  and 
AGREE-test  MTBF.  Although  the  regression  coefficients  are  significantly  different 
in  these  cases,  it  must  be  remembered  that  different  equipments  are  represented  in 
these  runs. 

3.2.5  Reliability  Measures 

The  data  that  could  be  assembled  within  the  scope  of  the  program  were  not 
sufficiently  detailed  on  a sufficient  number  of  equipments  to  warrant  performing 
regression  runs  on  reliability  measurements  other  than  MTBF  (e.g.,  total  failures, 
relevant  failures,  complaints). 

3.2.6  Final  Relationship 

For  eleven  of  the  equipments,  data  were  available  on  the  actual  number  of 
failures  observed  during  operational  use.  This  information  provided  valuable 
weighting  factors  with  which  to  increase  the  accuracy  of  the  regression  results*. 
The  run  identified  in  Table  1 as  Run  1 was  repeated  with  the  data  appropriately 
weighted;  the  following  regression  statistics  were  generated: 

Relationship:  0O  = 0.017  0A1,76 
Correlation  Coefficient:  r = 0.8l 

Standard  Error  of  Estimate:  a = 0.70 


*(See  Section  3.3  for  discussion  of  the  need  for  transforming  the  dependent 
variable  and  for  using  weighting  factors. ) 


7 


The  correlation  coefficient  is  a measure  which  shows  what  proportion  of  the 
original  variation  observed  by  a variable  (operational  MTBF)  can  be  explained  by 
the  independent  variable  (AGREE-test  MTBF).  The  standard  error  of  estimate  is  a 
measure  of  the  accuracy  with  which  estimates  may  be  made  for  new  observations. 
The  relationship  represents  equipments  with  a range  of  MTBF's  obtained  during 
operational  use. 


Figure  1 is  derived  from  these  final  statistics  and  indicates  that,  on  the 
average,  equipments  whose  MTBF's  are  less  than  200  hours  under  AGREE-test  conditions 
will  exhibit  operational  MTBF's  that  are  lower  than  the  AGREE  results.  However, 
due  to  the  small  sample  size  used  in  developing  the  relationship,  this  conjecture 
still  requires  verification. 


ra 

£ 

O 

•H 

■P 

•H 

G 

O 

o 


cd 

£ 

O 

•H 

-P 

<d 

<L) 

ft 

O 

G 

<D 

£ 

<D 

> 

<1) 

m 

& 

O 

n 


MTBF  Demonstrated  by  Manufacturer  Under  AGREE-Test  Conditions  (Hours) 


FIGURE  1 

RELATIONSHIP  OF  MTBF  VALUES  UNDER  AGREE-TEST  AND  OPERATIONAL  CONDITIONS 


This  trend,  illustrated  by  the  average  line  in  the  figure,  is  probably 
associated  with  the  fact  that  equipments  that  exhibit  high  reliability  at  the 
time  of  manufacture  are  less  likely  to  be  degraded  in  the  field  by  maintenance - 
induced  failures,  whereas  equipments  that  start  with  low  reliability  fail  more 
often  in  the  field  and,  therefore,  are  more  susceptible  to  maintenance -induced 
failures. 


If  average  flight  length  and  AEG  count  are  used  in  conjunction  with  the 
AGREE-test  data  to  estimate  field  reliability,  the  following  relationship  — which 
generated  an  R of  0.9  and  a a of  0.55  — should  be  used: 

£n  eQ  = 2.43  + 1.02  in  eA  + O.36  (flight  length  in  hours)  - 0.75  in  (AEG's) 

The  use  of  this  relationship  in  place  of  that  plotted  on  Figure  1 results 
in  a more  accurate  estimate.  For  example,  a 100  hours  MTBF  demonstrated  in  the 
AGREE-type  test  resulted  in  an  estimated  operational  MTBF  of  58  hours.  Use  of 
the  above  relationship  for  various  given  flight  lengths  and  AEG  counts  would 
result  in  the  following  estimates  of  operational  MTBF: 


Agree -Test  MTBF 

Flight  Length  (Hours) 

AEG 

Count 

Estimated 
Operational  MTBF 

2 

100 

87 

300 

48 

500 

24 

100 

180 

100 

4 

300 

100 

500 

81 

100 

258 

5 

300 

142 

500 

72 

2 

100 

268 

300 

147 

500 

75 

100 

550 

300 

4 

300 

304 

500 

153 

100 

785 

5 

300 

435 

500 

219 

The  addition  of  flight -hours  and  AEG-count  information  provides  a more 
accurate  estimate  of  whether  the  operational  MTBF  will  be  less  than  or  greater 
than  the  MTBF  observed  during  the  AGREE-type  test.  The  future  inclusion  of 
additional  significant  factors  into  the  analysis  should  enable  even  more  precise 
estimation  of  expected  operational  reliability. 


CONCLUSIONS  AND  RECOMMENDATIONS 


This  study  has  shown  that  there  is  a correlation  between  AGREE  test  results 
and  operational  reliability,  although  the  developed  relationship  is  based  on  a 
relatively  small  number  of  equipments  all  of  which  were  airborne  types.  However, 
there  will  be  considerably  more  operational  data  available  in  the  near  future 
(See  Table  A-l,  Appendix  A),  and  there  is  an  Indication  that  operational  data 
covering  Air  Force  equipments  that  have  been  subjected  recently  to  AGREE  testing 
will  also  be  available  soon. 

It  is  recommended  that  additional  regression  analysis  be  performed  as  soon 
as  field  results  are  available  on  equipments  that  have  recently  been  subjected 
to  AGREE  testing.  This  will  provide  a larger  data  base  for  deriving  relationships 
of  different  classes  of  equipments  and  covering  different  test  levels. 

It  would  also  be  advisable  to  correlate  the  AGREE  and  operational  test 
results  to  the  various  predictions  made  on  the  equipments  to  enable  estimates  of 
the  operational  reliability  to  be  made  as  accurately  and  as  early  as  possible. 


APPENDIX  A 

DATA  SOURCES  AND  SUMMARIES 

Equipments  that  have  been  subjected  to  AGREE  testing  were  identified  by 
reference  to  the  following  sources : 

(1)  Naval  Air  Systems  Command,  Code  AIR-533E1 

(2)  Naval  Air  Technical  Services  Facility 

(3)  In-house  ARINC  Research  sources 

Table' A-l  lists  the  equipments  that  these  sources  identify  as  having  been 
subjected  to  AGREE  testing.  AGREE  test  level  3 was  used  on  the  airborne  equip- 
ments for  which  field  data  were  available.  The  data  obtained  from  the  Naval  Air 
Systems  Command  consisted  of  the  manufacturer's  reports,  except  for  the  data  on 
the  AN/ARA-50,  LN-14  and  CADC  which  consisted  only  of  a reported  MTBF.  There  was 
a total  of  39  equipments  for  which  the  results  of  AGREE  testing  were  available. 

Data  from  field  operations  were  available  at  this  time  for  only  20  of  the  39 
equipments  that  have  been  subjected  to  AGREE  testing.  The  20  equipments  and  the 
aircraft  in  which  they  were  observed  are  identified  in  Table  A-2.  The  mean  times 
between  failures  for  these  equipments  are  presented  in  Table  A-3.  When  data  were 
available  from  more  than  one  aircraft  type,  the  values  in  the  "Operational"  columns 
represent  averages  for  the  aircraft  types  observed. 

Table  A-4  lists  the  data  used  in  deriving  the  relationship 

e0  = O.O170a1,76. 

Data  from  only  eleven  of  the  twenty  equipments  were  used  because  the  actual 
numbers  of  failures  observed  in  the  field  (statistics  necessary  for  an  accurate 
derivation)  were  not  available  for  the  other  nine  equipments. 


EQUIPMENTS  SUBJECTED  TO  AGREE  TESTING 


Availability  of 
Operational  Data 


Availability  of 
Operational  Data 


Source  of 
AGREE -Test 
Data* 


Source  of 
AGREE-Test 
Data* 


Equipment  Typo 


Equipment  Type 


Available 


Available 


Data  Link  Sets 
AN/ARR-60 
AN/ARR-61 
ANA'SC-2  (Type  2) 
AN/ASW-21 


AN/ARC-  '4 


AN/ARC -fc6 


Navigational  Seta 
AN/ARN-52 
AN/ASN-42 
AN/ASN-50 


AN/ARC-102 

AN/ARC -104 

AN/ARR-66 

AN/ART-36 

AN/'ARR-69 

AN/PRC-49 

an/prc-63 


Radar,  Doppler 
AN/APN-102 
AN/APN-141 
AN/A PN- 153 
AN/A PN- 167 


AN/ARA-50 


AN/PRT-5 

AN/PRT-6 


:or.putor  3e • : 


AN/APX-46 
AN/A PX -64 


AN/AJB-3A 

AN/AYK-2 


AN/APR-27 
AN/ASH -50 
AN/ASM-198 


(a)  Naval  Air  Systems  Command,  Code  AIR-533E1 

(b)  Naval  Air  Technical  Services  Facility 

(c)  In-House  ARINC  Research  Sources 


AIRCRAFT/EQUIPMENT  COMBINATIONS  FOR  WHICH  OPERATIONAL  DATA  ARE  AVAILABLE 


TABLE  A-3 


MTBF  VALUES  UNDER  AGREE-TEST  AND  OPERATIONAL  CONDITIONS 


Equipment  Type 

MTBF  (Hours)  for  Conditions  and  Sources 

Shown 

Under 

Operational  Conditions 

Under 

AGREE-Test  Conditions 

Source: 

ARINC  Research 

Source : 
NATSF 

Source: 

Manufacturer 

Source: 

NATSF 

AN/ASN-42 

85  (1,3) 

107  (2,0 

78  (1,2,3) 

61  (4) 

AN/ASQ-19 

9 (A, ») 

12  (2) 

47  (»,*,») 

AN/ASQ-57 

9 (1,3) 

13  (2) 

66  (1,2,3) 

AN/ASQ-58 

13  (1,3) 

12  (2) 

56  (1,2,3) 

AN/AYK-2 

228  (1,2) 

454  (1,2) 

AN/AJB-3A 

93  (1,2) 

175  (1,2) 

AN/APN-141 

109  (1,2,4) 

115  (1,2,4) 

AN/APN-153 

181  (1,2,4) 

155  (1,2,4) 

AN/ARC -94 

115  (1,2,4) 

154  (1,2,4) 

AN/ARN-52 

14 0 (1,2,4) 

67  (1,2,4) 

AN/USC-2  (Type  2) 

178  (1,2,4) 

100  (1,2,4) 

AN/ARR-60 

74  (1,2,3) 

164  (1,2,3) 

AN/ARR-61 

113  (1,2,3) 

198  (1,2,3) 

AN/ARC -34 

33  (1,2,3) 

4l  (1,2,3) 

AN/APX-102 

21  (1,2,3) 

32  (1,2,3) 

AN/ARC -21 

135 

108* 

an/arc  -58 

292 

79* 

AN/ARA-50 

400  (1,2) 

1,000  (1,2) 

LN-14 

26  (1,2) 

151  (1,2) 

CADC 

91  (1,2) 

126  (1,2) 

*Demonstrated  at  ambient  conditions. 

Note:  The  numbers  In  parentheses  Identify  the  regression  runs  for  which  the 
data  were  used  (see  Table  1 in  the  main  text). 


TABLE  A-4 

RELIABILITY  DATA  USED  IN  DERIVING  THE  MTBF  RELATIONSHIPS 


Equipment 

Type 

AGREE -Test 
MTBF 

Operational 

MTBF 

Number  of 
AEG's* 

Average 

Flight 

Length 

Number  of 

Failures 

Observed 

AN/ASN-42 

78 

107 

530 

5-7 

510 

AN/ASQ-19 

47 

12 

325 

1.7 

479 

AN/ASQ-57 

66 

13 

391 

1-9 

536 

AN/AS Q- 58 

56 

12 

447 

2.1 

262 

AN/AYK-2 

454 

228 

77 

2.7 

4 

AN/AJB-3A 

175 

66 

113 

1.7 

647 

175 

121 

113 

1.6 

605 

AN/APN-153 

155 

235 

253 

5-7 

l6l 

155 

167 

253 

2.1 

10 

155 

256 

253 

1.6 

91 

155 

45 

253 

1.9 

121 

AN/ARC -94 

154 

115 

188 

1 ■ 

478 

154 

235 

188 

64 

AN/ARN-52 

67 

301 

144 

154 

67 

105 

144 

144 

67 

151 

144 

188 

AN/USC-2 

100 

178 

349 

2.1 

T9 

AN/APN-141 

115 

111 

200 

1.7 

658 

115 

70 

200 

1-9 

60 

115 

120 

200 

1.7 

47 

*As  reported  in  NATSF-MR  No.  2,  January  through  June  1965. 


APPENDIX  B 


REGRESSION  ANALYSIS 


Adapted  from 

System  Reliability  Prediction  by  Function 
Volume  I,  ARINC  Research  Corporation 
Publication  241-01-1-375,  27  May  1963 


APPENDIX  B 


REGRESSION  ANALYSIS 


1.  General  Description 


This  Appendix  summarizes  the  Important  characteristics  of  the  multiple 
regression  technique  used  for  developing  a prediction  procedure.  Some  familiarity 
with  basic  statistical  theory  is  assumed.  Since  this  summary  is  necessarily 
limited  in  scope,  certain  sources*  should  be  consulted  for  more  detailed  discussion 
of  regression  analysis. 


The  application  of  the  regression  technique  presupposes  some  relationship 
between  a dependent  variable  Y and  one  or  more  independent  variables  X-^,  Xg,  ..., 
Xp.  The  simplest  case  to  consider  is  one  in  which  the  relationship  can  be  approxi 
mated  by  a general  linear  equation  of  the  form 


The  are  parameters  which,  in  n-dimensional  space,  generate  the  regression 

plane.  The  quantity  6 represents  an  error  term,  which  is  the  measure  of  the  varia- 
tion in  Y not  accounted  for  by  the  regression  plane.  To  achieve  the  general  form 
of  the  equation,  transformations  may  be  applied  to  original  Y and  X values.  For 
example,  if  Y = A + B/X}  + CXg2,  then  the  transformations  X^'  = 1/X1  and  Xg'  = Xg2 
yield  an  equation  equi valent  in  form  to  equation  (B-l). 


Through  analysis  of  data  involving  m observations  of  Y values  and  correspond 
lng  X values,  e.g.. 


Several  good  references  are: 

R.  L.  Anderson  and  T.  A.  Bancroft,  Statistical  Theory  in  Research,  McGraw-Hill 
A.  Hald,  Statistical  Theory  with  Engineering  Applications,  John  Wiley  & Sons: 
C.  Goulden,  Methods  of  Statistical  Analysis,  John  Wiley  & Sons. 


B 


I 


u 


estimates  of  the  ( can  be  obtained.  This  results  in  the  estimating  equation 

r 

1 " ^ +XVl  + ° ‘B'S> 


where  the 


(b.^),  termed  regression  coefficients,  are  estimates  of  the  true  but  unknown 
(P±}>  and 


e is  the  residual  of  the  true  Y about  the  estimated  regression  plane; 

r 


. . jr  - 0,o  ♦£■*>• 


In  the  usual  development  of  regression  theory,  the  following  assumptions 
and  conditions  are  Imposed: 

(a)  For  estimating  the  regression  equation,  no  assumptions  of  the  distribu- 
tion of  the  {X^}  need  be  made.  For  inferential  purposes,  either  of  two  general 
models  may  be  involved: 


Type  I Model  - The  X's  are  fixed  variables  in  that  no  probability  distribu- 
tions are  associated  with  them. 


Type  II  Model-  The  X's  are  considered  to  be  stochastic  variables. 


(b)  For  a fixed  set  of  X's,  {X^Jthe  Y's  are  normally  and  Independently 
distributed  with  mean  (P0  + e and  variance  a2.  (The  normality  assumption 

is  only  required  for  valid  applications  of  the  usual  significance  tests  and 
confidence -interval  estimation  procedures.  Non-extreme  departures  from  normality 
are  generally  not  too  serious. ) 


(c)  For  every  set  of  X's,  the  variance  of  Y is  the  same.  (This  assumption 
can  be  relaxed  to  include  the  case  where  the  variance  is  proportional  to  the  X's. 
See  Section  3*  3)- 


Assumptions  (b)  and  (c)  are  equivalent  to  the  assumption  that  the  errors  from 
the  true  regression  surface  are  normally  and  independently  distributed  with  zero 


mean  and  variance  a . From  equation  (4-2),  the  estimating  equation  for  the 


expected  value  of  Y is,  therefore. 


e (y)  - f - bo  + Y Vr 


(B-3) 


With  the  above  assumptions,  it  cam  be  shown  that  the  method  of  least  squares 
is  best  for  obtaining  the  estimates  {b^ ) in  the  sense  that  these  estimates  are 
unbiased  and  have  minimum  variance.  The  least-squares  method  is  one  for  which 


B-4 


'It  *. 


i » 


• * 


D 


f v-  • 


the  estimates  (b^}  are  chosen  to  minimize  the  sum  of  squares  of  deviations  from 
the  estimated  regression  plane,  termed  the  error  sum  of  squares  and  defined 
mathematically  by 


SSE  = 


* r 

Yj  - <>„*YhV]' 


(B-4) 


where  Yj  Is  the  J observed  value  of  Y with  corresponding  X values  of  ( X^j} . 

On  applying  the  usual  minimization  procedures  to  equation  (4-4)  by  differen- 
tiating with  respect  to  the  ( b^ },  the  well-known  r "normal  equations"  are  obtained, 
the  kbb  of  which  (k  = 1,  2,  ...,  r)  is 

bl  exkxl  + b2  exkx2  + "•  + bk  exk2  + •••  + br  6Xkxr  = eXky  (B-5) 

where  the  summations  are  over  the  sample  values  and  x^  = X^  - X^,  y = Y - Y, 
the  deviation  of  a sample  value  from  Its  sample  mean. 

Applying  elementary  algebraic  methods  to  these  r linear  equations  yields 
the  least-squares  estimates  (b^). 

2.  Regression  and  Correlation  Measures 

The  regression  coefficients  (b1)  form  the  equation  for  predicting  the  expected 
value  of  Y for  a given  set  of  X's.  The  data  used  to  obtain  these  estimates  will 
also  provide  further  information  on  the  true  relationship  and  characteristics 
of  the  prediction  equation  through  calculation  of  various  regression  and  correla- 
tion measures.  Several  of  the  more  important  measures  are  discussed  in  this 
section. 

2.1  Variation  About  The  Regression  Plane 

The  scatter  in  the  vertical  or  Y direction  of  the  observed  values  of  Y about 
the  regression  plane  is  perhaps  the  most  important  measure  for  evaluating  the 
prediction  ability  of  the  derived  equation.  The  population  measure  is  the  variance 

p 

of  6 , denoted  by  a . Provided  the  assumptions  stated  previously  hold,  an  unbiased 
2 2 

estimate  of  a , e.g.,  s ',  can  be  obtained  from  the  deviations  of  the  observed  Y 


of  6 , denoted  bya  . Provided  the  assumptions  stated  previously  hold,  an  ur 
2 2 

estimate  of  a , e.g.,  s ',  can  be  obtained  from  the  deviations  of  the  observec 
values  from  the  computed  regression  plane.  The  positive  square  root  of  this 
variance,  s,  is  often  called  the  standard  error  of  estimate. 


I 


With  the  normality  assumption  and  a large  sample  size,  approximately  95# 
of  the  sample  points  will  lie  within  ±2s  from  the  estimated  regression  plane. 
Description  of  the  population  distribution  In  terms  of  confidence  Intervals  Is 
discussed  In  Section  2.5. 

2.2  Variance  of  the  Regression  Coefficients 

The  variance  of  the  (b^}  are  also  obtainable  from  the  source  data.  These 
statistics  are  useful  for  comparing  regression  coefficients  for  two  sets  of  data, 
for  evaluating  the  significance  of  a variable  on  the  overall  regression,  and  for 
determining  If  an  observed  b differs  significantly  from  a theoretical  value. 

2.3  Correlation  Measures 

Correlation  theory  is  concerned  primarily  with  measuring  the  degree  of 
association  or  covariation  between  two  or  more  variables  — as  differentiated 
from  regression  theory,  which  attempts  to  describe  the  relationship  through  a 
regression  equation.  Since  correlation  theory  has  been  fully  developed  only  when 
Y and  the  X's  form  a multivariate  normal  distribution,  interpretation  of  correla- 
tion measures  for  other  cases  is  somewhat  uncertain.  However,  correlation  measures 
can  be  used  for  significance  tests  (see  Section  2.4)  and  do  offer  some  idea  of  the 
degree  of  association,  provided  that  the  values  of  the  independent  variables  are 
not  preselected. 

2.3.1  Simple  Correlation  Coefficient 

Given  two  variables  X and  Y,  a simple  correlation  coefficient  is  defined  by 


°x  °y 

where  cr^  and  a are  standard  deviations  of  X and  Y respectively  and  axy  is  the 
covariance  between  X and  Y defined  as  E{[X-E(X)]  [Y-E(Y)]}.  It  can  be  shown  that 
-1  a p s 1. 


A more  meaningful  description  of  the  correlation  coefficient  can  be  presented 

p 

through  a regression  viewpoint.  Let  o represent  the  variance  of  Y in  terms  of 

Y 2 

deviations  from  the  mean  of  all  Y values.  Let  a , . represent  the  variance  of  Y 

y 1 ■ x 2 2 

when  deviations  from  the  regression  line  of  Y and  X are  used.  Then  (a  - a i ) 

y yix 

represents  the  reduction  in  the  variance  of  Y due  to  estimating  Y from  X rather 
than  from  the  average  Y value.  The  square  root  of  the  fraction  of  total  variation 
accounted  for  by  regression  is  the  population  correlation  coefficient,  i.e.. 


2 2 . 

a y - a y x 


ax  ay 


2P  2 

If  p =0,  then  o=o  . , implying  no  reduction  in  a through  knowledge 
y y I x y 

of  X;  in  other  words,  the  regression  line  has  slope  0 and  is  coincident  with  the 
mean  of  Y.  If  p = ± 1,  then  °2y|x  = 0,  implying  all  points  (X,Y)  lie  on  the 
regression  line.  The  sign  of  pis  the  same  as  the  slope  of  the  regression  line. 

2.3.2  Multiple  Correlation  Coefficient 

The  multiple  correlation  coefficient,  R,  is  a relative  measure  of  the  associa- 
tion between  three  or  more  variables.  It  is  analogous  to  the  simple  correlation 
coefficient  except  that  its  sign  is  always  taken  to  be  positive.  The  square  of 
the  multiple  correlation  coefficient,  is  known  as  the  coefficient  of  determination. 
This  quantity  can  be  used  to  determine  if  the  addition  of  an  independent  variable 

p 

is  significant  by  comparison  of  the  sample  R for  (n-l)  variables  to  the  sample 

p 

R for  n variables. 


2.3.3  Partial  Correlation  Coefficient 

A partial  correlation  coefficient  measures  the  association  between  two 
variables  when  the  influence  of  all  other  variables  considered  is  eliminated. 
Simple  correlation  merely  ignores  the  influence  of  other  variables.  Partial 
correlation  coefficients  give  insight  into  which  independent  variables  are  closely 
related  to  the  dependent  variable. 


2.4  Significance  Tests 

With  the  assumption  of  normally  and  independently  distributed  Y values  for 
a given  set  of  { X^}  , various  tests  may  be  performed  to  determine  the  statistical 
significance  of  the  computed  statistics. 

The  overall  significance  of  the  regression  can  be  determined  through  an  P 
test  by  comparison  of  the  sum  of  squares  due  to  regression  with  the  sum  of  squares 
due  to  regression  with  the  sum  of  squares  due  to  error.  We  have 


SSE 


Vi/ 


where  the  y and  {x^}  are  deviations  from  their  respective  means, 
the  summation  we  find 


On  expanding 


SSE  = 


l 


(B-6) 


Since  ey^  represents  the  total  sum 
side  of  equation  B-6  represents  the  sum 


of  squares,  the  second  term  on  the  right 
of  squares  due  to  regression,  or 


SSR 


w 


(B-7 ) 


B-7 


It  can  be  shown  that  if  all  (3^  are  equal  to  zero,  the  case  equivalent  to 
no  regression,  then 

E(SSE)  = (ra-r-l)  o2 
E(SSR)  = r a2 


To  test  this  hypothesis,  compute  the  quantity  (SSR)  (m-r-l)/(SSE)  (r)  which  is 
distributed  as  F with  r and  (m-r-l)  degrees  of  freedom.  The  associated  analysis- 
of -variance  table  is  given  below. 


ANALYSIS-OF  VARIANCE  TABLE  FOR  TESTING 

SIGNIFICANCE  OF  THE  OVERALL  REGRESSION 

Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares 

Mean 

Square 

Regression 
(r  independent 
Variables) 

r 

SSR 

SSR/r 

Error 

m-r-l 

SSE 

SSE  _2 

m-r-l 

m 

Total 

m-1 

Division  of  SSR  and  SSE  by  Sy2  yields  the  statistic 


r <SSR>  (°-r-l>V  Rg(„-r-l) 

(SSEHrJ/Sy*  (1-H2)(t) 

Which  is  often  used  for  the  significance  test. 


(B-8) 


In  a similar  manner,  the  addition  of  (r  - p)  independent  variables  to  the 
regression  equation  is  tested  for  significance  by  computing  the  statistic 


(SSRy,  - SSR  ) (m-r-l) 

F = E E 

(SSE)  (r-p) 


(Rr2  - Rp2  (m-r-l) 
(1-R2r)  (r-p) 


(B-9) 


where  the  subscripts  r and  p refer  to  the  number  of  independent  variables  being 
considered. 


Tests  on  individual  variables  may  be  performed  by  examining  the  hypothesis 
that  a particular  = 0.  This  is  accomplished  through  use  of  the  fact  that  the 


p 2 

quantity  - 0^/s^,  where  sbi  / z Xj  , is  distributed  as  Student's  t with 

J 

(m-r-l)  degrees  of  freedom. 

2.5  Confidence  Interval  Estimates 

Equation  B-3  provides  a point  estimate  of  the  expected  value  of  Y for  a known 
set  of  X's.  Confidence  limits  for  the  true  mean  value  of  Y as  well  as  limits  for 
an  individual  Y value  for  a given  set  of  X's  are  also  obtainable  from  the  data 
through  the  assumption  of  the  normality  of  the  distribution  of  the  Y arrays. 


^ For  the  confidence  interval  on  the  mean  value  of  Y,  the  variance  of 
[Y  - E(y) ] is  required.  For  just  one  independent  variable  that  is  assumed  to 
have  the  value  X*,  with  a corresponding  predicted  value  of  the  mean  of  Y equal 
to  Y , 

y'-  E(Y)  = bQ  + bjX  - (PQ  + P0X#) 

= 0>o  - e0>  + (bi  - pi)x^ 


var  [Y'  -E(Y)]  = °2  | 


2>j  - 1 

J=1 


(B-10) 


For  an  individual  Y value. 


var  (y'  - Y)  = °2  1 + | 


m + m 


(x  -_r> 


l 

j=i 


(B-11) 


Using  s2,  the  estimate  of  a2,  in  equations  4-10  and  4-11,  the  100  (l  - a)% 
confidence  interval  for  E(Y)  is 


7'  * to/2  y var  [Y'  - E(Y)] 


The  (1  - 0.)%  confidence  interval  for  an  individual  Y value  is 

Y ± to/2  yV ar  [Y  - Y) 


(B-12) 


(B-13) 


Extensions  of  equations  (4-12)  and  (4-13)  can  be  obtained  for  the  multivariate 
casej  they  are  discussed  in  Section  4.3,  where  computational  aspects  are  considered. 

3.  Application  to  Reliability  Prediction 

3.1  Type  of  Regression  Model  Relationships 

As  in  many  practical  applications,  strict  conformance  to  the  theoretical 
requirements  was  not  possible  for  this  project.  The  types  of  regression  models 
appropriate  to  the  variables  used  were  not  preselected,  but  were  those  already 
existing.  Some  of  the  variables,  such  as  AEG  counts,  can  be  considered  to  be 
fixed  with  no  measurement  errors.  Others,  such  as  frequency  and  power  consumption, 
do  have  associated  probability  distributions  and  possibly  non -negligible  measurement 
errors. 

Therefore,  a strict  designation  of  a Type  I or  Type  II  model  cannot  be  made. 
This,  however,  will  not  affect  the  analysis  with  respect  to  the  estimate  of  the 
regression  equation  if  the  listed  assumptions  are  reasonably  satisfied.  It  is 
further  believed  that  the  degree  of  rigor  achieved  with  respect  to  other  aspects 
of  the  analysis  is  satisfactory  and  consistent  with  the  intended  use  of  the 
results. 


3.2  Transformations  of  Independent  Variables 

As  indicated  previously,  a model  that  is  linear  in  the  regression  parameters 
Is  required.  Transformations  were  applied  to  several  of  the  independent  variables 
to  linearize  suspected  logarithmic  or  multiplicative  relationships  through  consider- 
ation of  the  physical  processes  involved. 

In  those  cases  where  no  prior  knowledge  of  Judgments  existed,  transformations 
that  gave  the  best  results  were  used.  The  transformations  were  usually  determined 
on  the  basis  of  analyzing  the  partial  correlation  coefficients,  or  the  significance 
of  the  regression  coefficients  for  two  or  more  transformations  that  were  throught 
to  be  reasonable  for  the  variable  under  consideration. 

Graphical  procedures  normally  used  to  determine  the  form  of  the  relationship, 
and  thus  the  appropriate  transformation,  were  not  feasible  because  of  the  many 
Independent  variables  involved. 

3.3  Transformation  of  Dependent  Variable 

The  dependent  variable  under  consideration  is  the  equipment  mean  life  expressed 
as  the  mean  tlme-Jietween  failures  (MTHF).  When  the  failure  times  of  the  systems 
were  approximately  distributed  in  accordance  with  the  exponential  assumption,  i.e., 
the  density  of  failure  time  t,  is 


f(t)  = I e-t/e 

G 


(B-14) 


where  0 is  the  true  mean  life 


If  k failures  are  observed  In  a total  of  T hours,  the  estimate  of  6 Is 


The  density  of  the  statistic  9 can  be  shown  to  be 


g(e) 


1 

(k-i); 


-k-i  e-ke/e 


(B-15) 


'^(1/ 


g k-i  e-ke/e 


(B-l6 ) 


Equation  (B-l6)  represents  a gamma  density  with  a mean  of  9 and  variance 
e A.  Thus,  even  If  the  number  of  failures  were  Identical  for  all  systems,  the 
variances  of  the  Y's  are  not  constant  but  are  a function  of  the  true  mean  lives 
of  the  system.  A common  procedure  for  stabilizing  the  variance  when  it  is  propor- 
tional to  the  square  of  the  mean  is  to  employ  the  logarithmic  transformation 
Y = log  9.  This  transformation,  fortunately,  will  also  tend  to  yield  an  approx- 
imately normal  distribution  for  the  Y's*. 

In  order  to  account  for  varying  number  of  failures,  the  Y values  were  also 

0 O 

weighted  by  their  respective  number  of  failures,  since  if  var  (e)  = e A,  then 
through  use  of  a Taylor's  series,  var  (logs)  » lA.  The  normal  equations  are 
then  obtained  by  minimizing 


m r 

SSE  = £ Wj(Yj  - b0  - V t>1X± j )2  (B-17) 

J=1  £A 


with  respect  to  the  [ b.^  } where  Yj 
on  the  Jth  equipment  type. 


logej 


and  w 


J 


is  proportional  to  the  number 


3.4  Computational  Aspects 

This  section  summarizes  the  formulas  and  approach  used  for  obtaining  the 
prediction  equation,  performing  significance  tests,  and  obtaining  the  final  results 


* See,  for  example,  K.  Brownlee,  Statistical  Theory  and  Methodology  in  Science 
and  Engineering,  John  Wiley  and  Sons,  p.  ili>. 


3-^*1  Regression  and  Correlation  Statistics 
Notation: 

n = total  number  of  variables  (also  the  subscript  pertaining  to  the 
dependent  variable) 

r = n-1  = total  number  of  Independent  variables 

m = total  number  of  sample  observations  (equivalent  to  the  number  of 
equipment  types) 

= value  of  the  1th  variable  on  the  Jth  observation  (l  = 1,  2,  n; 

J =1»  2 , • • • , m) 

S = summation  operation  over  the  sample  observation; 


sxi  - I xir 


Sample  means 


Y = i 

*1  m 


(B-18) 


Sample  variances: 


2 S(X±  - 3C±)2  m S X±2  - (SX±)2 


(B-19) 


Simple  Correlation  Coefficients: 

r s<xi  - V <xi  - 

ij  (m-l)  s.^  Sj 


nSX±X,  - SX±  SX , 


(m-l ) s±  Sj 


(B-20) 


For  obtaining  multiple  regression  statistics  by  computer  utilization,  the 
Inverse  matrix  of  the  simple  correlation  coefficients  can  be  used  advantageously. 

Let  r^j  = a^  represent  an  element  of  thlB  Inverse  matrix.  Then  the  following 
computational  equations  hold: 


Regression  Coefficients: 

y.  _ anl  sn  . _ , o 

bi  “ 5“  s7  1 “ 1>  2'  r 

nn  1 


X 

bo  = *n  " X bi*i‘ 


(B-21) 


(B-22) 


Partial  Correlation  Coefficients: 
a. 


ni 


ni. 


7 


ann  aii 


1=1,2,  . . . , r. 


Standard  Error  of  the  Regression  Coefficients: 


bi 


- [EH-  ± - ! 2 


Multiple  Correlation  Coefficient: 


(B-23) 


(B-24) 


Standard  Error  of  Estimate : 
3 = sn  J Im-nTa^  • 


(B-25) 


(B-26) 


3.4.2  Computational  Procedure  for  Significance  Tests 

For  a set  of  n variables,  the  overall  significance  of  the  regression  can  be 
evaluated  by  computing  the  following  statistic 

F = R lm~n)  (B-27) 

l-R(n-l) 

and  comparing  it  with  a critical  F for  (n-l)  and  (m-n)  degrees  of  freedom  at  a 
preselected  significance  level  of  a.  if  the  calculated  F is  greater  than  the 
critical  F,  then  the  hypothesis  that  all  of  the  are  equal  to  zero  is  rejected 
and  the  overall  regression  is  judged  to  be  significant. 

In  order  to  determine  which  of  the  r independent  variables  are  the  significant 
contributors  to  the  regression,  a systematic  procedure  is  required  to  reduce  the 
computational  problem  to  manageable  size. 

To  illustrate  the  problem,  if  it  were  decided  that,  at  most,  five  of  twenty 
independent  variables  would  be  finally  selected,  one  approach  would  be  to  try  all 
possible  combinations  of  five -out -of -twenty  and  pick  the  best  of  these.  This 
would  then  involve  at  least  ^20  y,  or  more  than  15,000  regression  runs,  a task 

prohibitively  expensive  even  for  the  larger  electronic  computers. 


B-13 


The  systematic  approach  used  in  this  project  for  significant  variable  selection 
Is  called  the  "square  root"  method;  it  is  described  in  detail  in  a paper  by 
A.  Summerfield  and  A.  Lubin: 

"A  Square  Root  Method  of  Selecting  a Minimum  Set  of  Variables 
in  Multiple  Regression",  Psychometrlka  Part  I:  The  Method; 

Volume  16,  No.  3,  September  1951,  PP*  271-284. 

Part  II:  A Worked  Example;  Volume  16,  No.  4,  December  1951, 
pp.  425-437. 

This  method  involves  determining  partial  correlations  of  each  independent 
variable  with  the  dependent  variable  when  the  Influence  of  all  previously  selected 
variables  is  eliminated.  The  highest  semi -partial  correlation  indicates  which  of 
the  remaining  non-selected  variables  will  add  most  to  the  overall  regression. 

o 

The  reduction  in  the  error  sum  of  squares  (addition  to  R ) attributed  to  this 
variable  is  then  tested  by  the  P-ratio  criterion.  If  the  variable  is  determined 
to  be  insignificant,  no  other  single  remaining  variable  will  be  3lgnif leant.* 

The  computational  procedure  involves  the  construction  of  a triangular  matrix 
(called  the  square  root  matrix  by  Summerfield  and  Lubin)  of  semi -partial  correla- 
tion coefficients  from  the  original  matrix  of  simple  correlation  coefficients. 

If  represents  an  element  of  this  square  matrix,  it  has  the  following  properties: 

S±J  = 0 for  i < J 

S = semi -partial  correlation  of  variable  J to  variable  i when  variables 
1,  2,  ...,  j-1  (preselected)  are  held  constant 


— r^i  for  1 — 1,  . . . , n. 


The  are  obtained  as  follows: 

i=l 


<v: 

J-1 


V2 


J 1 “ 1 > • • • 9 ft* 


(B-28) 


SiJ  = riJ 


T 

k=l 


Sik  SJk 


3ii 


i > J,  1 * 2, 

J =1,  2,  . . . , r. 


(B-29) 


* It  Is  possible  to  consider  the  addition  of  a combination  of  variables.  This 
- has  not  been  done  directly  in  this  project. 


B-14 


:j 


1 

- >1 
:j 





...  ' 


The  selection  order  is  determined  as  follows: 

(S.  (b)  = the  ath  column  of  the  square  root  matrix  when  variable  b is 
assumed  to  be  the  a1  most  important: 

(1)  That  variable  J for  which  Tjn  is  a maximum  is  the  first  selected 
variable.  Call  this  variable  J^.  Then  = r^^. 

(2)  Calculate  Srig  (j)  for  all  J ^ The  value  j for  which  SM(j)  is  a 

maximum  is  the  second  most  important  variable.  Call  this  variable  Jg. 

The  second  column  of  the  square  root  matrix  is  then  S12  “ §.12(^2^' 

(3)  In  general,  for  obtaining  the  pth  most  Important  variable  after  variables 

Jp  J2j  Jp-1  have  been  selected,  calculate  Snp  (j)  for  J ^ J1,  Jg> 

j j.  That  value  J for  which  Snp(j)  is  a maximum  is  selected  and  = Snp(jp). 

After  each  selection  an  F test  is  performed  to  determine  if  the  addition 
P th 

to  R is  significant.  Thus,  to  determine  if  the  p selected  variable  is  signifi- 
cant, compute 


(r2p  - r2p-i> 
(l-*2p) 


(B-30) 


and  compare  to  a critical  F based  on  1 and  (m-l-p)  degrees  of  freedom  for  a chosen 
significance  level  a. 

The  final  regression  results  can  then  be  obtained  from  the  correlation  matrix 
and  the  means  and  variances  of  those  variables  determined  to  be  significant. 


1 1 


3.4.3  Confidence  Interval  Prediction 
To  obtain  estimates  of  the  variances  required  for  confidence  interval 

A A 

prediction,  namely  var  [Y-E(Y) ] and  var  (Y  - Y),  the  following  equations  are 
applicable : 


var  [Y'  - E(Y) ] = 


: - r S + f “u  <y-y 

L i,J=l 


{Y  - Y)  = s2  + var  [Y'  - E(Y) ] 


(B-31) 


(B-32: 


where  the  c.^  are  known  as  Gauss  multipliers.  They  can  be  obtained  from  the 
Inverse  correlation  matrix  by  the  following  equations: 

a a..  - a ,2 
nn  ii  ni 


(m-ljs^  ar 


(B-33) 


